TweetNorm: Text Normalization on Italian Twitter Data
نویسندگان
چکیده
This paper addresses the issue of text normalization on non-standard Italian data. We present TweetNorm1, a system which normalizes Italian tweets in a way that the amount of microblog slang and distorted text appearance is drastically reduced and the normalized output has a much cleaner and more formal style. The paper shows that with a set of fixed language-independent rules and trained rules for language-dependent abbreviation and acronym expansion good results can be achieved for normalizing Italian Twitter messages.
منابع مشابه
TweetNorm: a benchmark for lexical normalization of Spanish tweets
The language used in social media is often characterized by the abundance of informal and non-standard writing. The normalization of this non-standard language can be crucial to facilitate the subsequent textual processing and to consequently help boost the performance of natural language processing tools applied to social media text. In this paper we present a benchmark for lexical normalizati...
متن کاملAutomatically Extracting Variant-Normalization Pairs for Japanese Text Normalization
Social media texts, such as tweets from Twitter, contain many types of nonstandard tokens, and the number of normalization approaches for handling such noisy text has been increasing. We present a method for automatically extracting pairs of a variant word and its normal form from unsegmented text on the basis of a pair-wise similarity approach. We incorporated the acquired variant-normalizatio...
متن کاملElhuyar at Tweet-Norm 2013
This paper presents the system developed by Elhuyar for the TweetNorm evaluation campaign which consists of normalizing Spanish tweets to standard language. The normalization covers only the correction of certain Out Of Vocabulary (OOV) words, previously identified by the organizers. The developed system follows a two step strategy. First, candidates for each OOV word are generated by means of ...
متن کاملIITP: Hybrid Approach for Text Normalization in Twitter
In this paper we report our work for normalization of noisy text in Twitter data. The method we propose is hybrid in nature that combines machine learning with rules. In the first step, supervised approach based on conditional random field is developed, and in the second step a set of heuristics rules is applied to the candidate wordforms for the normalization. The classifier is trained with a ...
متن کاملText Analytics of Customers on Twitter: Brand Sentiments in Customer Support
Brand community interactions and online customer support have become major platforms of brand sentiment strengthening and loyalty creation. Rapid brand responses to each customer request though inbound tweets in twitter and taking proper actions to cover the needs of customers are the key elements of positive brand sentiment creation and product or service initiative management in the realm of ...
متن کامل